Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR

نویسندگان

  • Felix Weninger
  • Hakan Erdogan
  • Shinji Watanabe
  • Emmanuel Vincent
  • Jonathan Le Roux
  • John R. Hershey
  • Björn W. Schuller
چکیده

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘näıvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks

Long Short-Term Memory (LSTM) recurrent neural network has proven effective in modeling speech and has achieved outstanding performance in both speech enhancement (SE) and automatic speech recognition (ASR). To further improve the performance of noise-robust speech recognition, a combination of speech enhancement and recognition was shown to be promising in earlier work. This paper aims to expl...

متن کامل

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Co...

متن کامل

Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?

Using deep neural networks (DNNs) for automatic speech recognition (ASR) has recently attracted much attention due to the large performance improvement they provide for a variety of tasks. DNNs are known to be robust to overfitting and to be able to remove speaker variability. Another important cause of variability in speech is the presence of noise. A lot of research has been undertaken on noi...

متن کامل

The Munich Feature Enhancement Approach to the 2nd Chime Challenge Using Blstm Recurrent Neural Networks

We present a highly efficient, data-based method for monaural feature enhancement targeted at automatic speech recognition (ASR) in reverberant environments with highly non-stationary noise. Our approach is based on bidirectional Long Short-Term Memory recurrent neural networks trained to map noise corrupted features to clean features. In extensive test runs, enhanced features are evaluated wit...

متن کامل

Low-dimensional recurrent neural network-based Kalman filter for speech enhancement

This paper proposes a new recurrent neural network-based Kalman filter for speech enhancement, based on a noise-constrained least squares estimate. The parameters of speech signal modeled as autoregressive process are first estimated by using the proposed recurrent neural network and the speech signal is then recovered from Kalman filtering. The proposed recurrent neural network is globally asy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015